Machine Learning Algorithms for Portuguese Named Entity Recognition

نویسندگان

  • Ruy Luiz Milidiú
  • Julio C. Duarte
  • Roberto Cavalcante
چکیده

Named Entity Recognition (NER) is an important task in Natural Language Processing. It provides key features that help on more elaborated document management and information extraction tasks. In this paper, we propose seven machine learning approaches that use HMM, TBL and SVM to solve Portuguese NER. The performance of each modeling approach is empirically evaluated. The SVM-based extractor shows a 88.11% F-score, which is our best observed value, slightly better than TBL. This is very competitive when compared to state-of-the-art extractors for similar Portuguese NER problems. Our HMM has reasonable precision and accuracy and does not require any additional expert knowledge. This is an advantage for our HMM over the other approaches. The experimental results suggest that Machine Learning can be useful in Portuguese NER. They also indicate that HMM, TBL and SVM perform well in this natural language processing task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Learning Named Entity Recognition in Portuguese from Spanish

We present here a practical method for adapting a NER system for Spanish to Portuguese. The method is based on training a machine learning algorithm, namely a C4.5, using internal and external features. The external features are provided by a NER system for Spanish, while the internal features are automatically extracted from the documents. The experimental results show that the method performs...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Portuguese Language Processing Service

Current Natural Language Processing tools provide shallow semantics for textual data. These kind of knowledge could be used in the Semantic Web. In this paper, we describe F-EXT-WS, a Portuguese Language Processing Service that is now available at the Web. The first version of this service provides Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. All these tools were b...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2007